Run

A collection can be executed multiple times. A Run is a single execution of a collection.

Endpoints

POST /v1/async/collections/{collection_id}/run
GET /v1/async/collections/{collection_id}/runs
GET /v1/async/collections/{collection_id}/runs/{run_id}
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs
GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result

POST `/v1/async/collections/{collection_id}/run`

This endpoint triggers a Run of a collection.

A collection_id is required to make this request.

Response Example:

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "in_progress",
  "total_requests": 2,
  "success_requests": 0,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
  "job_ids": ["job-uuid-1", "job-uuid-2"],
  "callback_url": "https://your-server.com/webhook",
  "callback_status": "pending"
}

Details about the returned fields can be found in Reference.

GET `/v1/async/collections/{collection_id}/runs`

Lists every run of a given collection, newest first.

Useful for two things:

Audit / dashboards: see all the times a collection has been executed.
Recovery after a submit timeout: you persisted the collection_id, your POST /run request lost its response — re-attach to the live run with ?status_filter=in_progress instead of triggering a duplicate.

# All runs
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs' \
  -H 'Authorization: Bearer <API-KEY>'

# Just the live run
curl 'https://api.scrapingpros.com/v1/async/collections/{collection_id}/runs?status_filter=in_progress' \
  -H 'Authorization: Bearer <API-KEY>'

Response Example:

{
  "items": [
    {
      "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
      "status": "in_progress",
      "total_requests": 100,
      "success_requests": 73,
      "failed_requests": 5,
      "timeout_requests": 0,
      "collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
      "callback_url": null,
      "callback_status": null,
      "created_at": 1777853217.82
    }
  ],
  "total": 1
}

GET `/v1/async/collections/{collection_id}/runs/{run_id}`

This endpoint returns the current status of a Run, including the webhook delivery status.

Response Example

{
  "run_id": "9b64941a-4545-4c57-9174-c70e781d9192",
  "status": "completed",
  "total_requests": 2,
  "success_requests": 2,
  "failed_requests": 0,
  "timeout_requests": 0,
  "collection_id": "9634997b-6431-4b11-a4cb-fc00e941ba8d",
  "job_ids": ["job-uuid-1", "job-uuid-2"],
  "callback_url": "https://your-server.com/webhook",
  "callback_status": "sent"
}

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs`

Lists all jobs of a run with cursor-based pagination. Returns metadata (URL, status, timings, custom_id, validator fields) without the HTML body — use the /result endpoint below to download content.

Query parameters:

Param	Type	Default	Description
`cursor`	string	(none)	Opaque cursor returned by the previous page. Omit on first call. Encoding depends on `order_by` — mixing them returns 400.
`limit`	integer	100	Page size. Min 1, max 1000.
`status_filter`	string / CSV	(none)	Single value or CSV: `completed`, `failed`, `timeout`, `processing`. Example `status_filter=completed,failed,timeout`.
`since_completed_at`	ISO 8601 string	(none)	Returns only rows with `completed_at` strictly greater. Accepts `Z`, `+00:00`, or naive (UTC). Rows with NULL `completed_at` are excluded.
`order_by`	`id` \| `completed_at`	`id`	Sort order. Use `completed_at` for streaming completions as they finish.
`order_dir`	`asc` \| `desc`	`asc`	Honored only for `order_by=completed_at`.

Response example:

{
  "items": [
    {
      "job_public_id": "e3a1b2c4-...",
      "run_public_id": "9b64941a-...",
      "collection_id": "9634997b-...",
      "status": "completed",
      "url": "https://example.com/tours/123",
      "custom_id": "tour_12345",
      "url_truncated": false,
      "status_code": 200,
      "message": null,
      "queued_at": "2026-04-23T12:00:00.123",
      "started_at": "2026-04-23T12:00:02.267",
      "completed_at": "2026-04-23T12:00:03.637",
      "execution_time_ms": 1370,
      "retries_attempted": 0,
      "block_reason": null,
      "protection_stack": ["cloudflare"],
      "rule_hits": []
    }
  ],
  "cursor_next": "MzQ=",
  "has_more": true
}

Timing: jobs appear in this listing roughly 5 seconds after completion (internal metadata flusher tick). The ordered sequence of queued_at → started_at → completed_at lets you compute queue wait time and execution latency per job.

Retention: listing metadata is retained for 90 days after the run (MySQL partitioned tables). HTML bodies are retained for 48 hours — beyond that window, the /result endpoint returns 404 but the listing above is still available.

Pagination pattern:

cursor = None
while True:
    params = {"limit": 500}
    if cursor:
        params["cursor"] = cursor
    page = requests.get(jobs_url, headers=H, params=params).json()
    for job in page["items"]:
        handle(job)
    if not page["has_more"]:
        break
    cursor = page["cursor_next"]

See full reference at apiReference/scrapeo_asincronico.

GET `/v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result`

Retrieves the full result of a specific job (HTML body, extracted data, timings). Results are available for 48 hours after job completion.

Response Example

{
  "url": "https://example.com/tours/123",
  "custom_id": "tour_12345",
  "html": "<!doctype html>...",
  "statusCode": 200,
  "timings": {"queue_wait_ms": 45, "proxy_ms": 120},
  "potentiallyBlockedByCaptcha": false,
  "extracted_data": null
}

The response includes url and custom_id so you can correlate each result back to your original request without relying on insertion order.

If the result is unavailable, the API responds with 404 and a structured detail that tells you which kind of unavailable it is:

HTTP 404
{
  "detail": {
    "error_code": "result_lost",
    "message": "Job result is unavailable due to a service incident during the completion window. Contact support if the data is critical — it may qualify for refund.",
    "completed_at": "2026-04-30T12:34:56Z",
    "age_hours": 0.4
  }
}

`error_code`	Meaning	Suggested action
`result_pending`	Job is still in flight, or the worker did not store a result yet.	Retry shortly.
`result_expired`	More than 24 h since completion — the body has been pruned.	Re-run the collection if you still need the data.
`result_lost`	Body unavailable within the 24 h window.	Contact support — may qualify for refund.
`job_id_invalid`	We have no record of this job.	Verify the IDs in your client.

Webhooks

If the collection has a callback_url configured, a signed HTTP POST is automatically sent upon run completion:

{
  "event": "run.completed",
  "run_id": "uuid",
  "collection_id": "uuid",
  "status": "completed",
  "total_requests": 2,
  "success_requests": 2,
  "failed_requests": 0,
  "job_ids": ["job-uuid-1", "job-uuid-2"],
  "results_url": "https://api.scrapingpros.com/v1/async/collections/{cid}/runs/{rid}",
  "timestamp": "2026-04-06T20:30:00Z"
}

Security: The webhook includes an HMAC-SHA256 signature in the headers:

X-SP-Signature: sha256=<hex> -- signature of {timestamp}.{body}
X-SP-Timestamp: <unix_epoch>

Retries: If delivery fails (timeout, 5xx), it is automatically retried up to 5 times with backoff: 1min, 5min, 30min, 2h, 12h. The callback_status field reflects the current status.

Reference

run_id: Generated UUID of the run. This value is recommended for run tracking using GET /v1/async/collections/{collection_id}/runs/{run_id}.
status: The current status of the Run. It can take 2 values: in_progress or completed.
total_requests: Number of requests in the collection.
success_requests: Number of requests that delivered usable content (HTTP 2xx + no block signal). A job whose worker completed but whose target returned 4xx/5xx or a captcha page is counted under failed_requests, not here.
failed_requests: Number of requests that failed.
timeout_requests: Number of requests that timed out.
collection_id: UUID of the collection.
job_ids: List of UUIDs of the individual jobs. Use these to retrieve results with the job result endpoint. Available for the lifetime of the run, regardless of status — you can always enumerate the jobs of a run, even after status=completed and after the result bodies have expired (the listing metadata is kept for 90 days).
callback_url: Configured webhook URL (if set).
callback_status: Webhook delivery status: pending (in progress), sent (delivered), failed (delivery failed), retrying (retrying delivery).

Endpoints​

POST /v1/async/collections/{collection_id}/run​

GET /v1/async/collections/{collection_id}/runs​

GET /v1/async/collections/{collection_id}/runs/{run_id}​

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs​

GET /v1/async/collections/{collection_id}/runs/{run_id}/jobs/{job_id}/result​

Webhooks​

Reference​